SNA data introduction

Session 4a

Author

Affiliation

Zixi Chen, PhD

NYU-Shanghai

Published

November 13, 2025

1 Network data in Tidyverse

With knowing the structure of network data, we can now turn to the basics of social network analysis using the tidygraph package in R.

tidygraph provides a tidy framework for working with network data, making it easy to manipulate data and visualization using the interfaces defined in the dyplyr and ggplot packages. It also provides tidy interfaces to many other established SNA packages in R, such as igraph.

# install.packages("tidygraph")
library(tidygraph)
library(tidyverse)

1.1 Creating a graph object

Working with network data representation is normally start with creating a tbl_graph object. The tbl_graph object provides a structured way to store both node (i.e., actor) and edge (i.e., tie) data in a single object.

It consists of two data frames: one for nodes and another for edges. This structured representation makes it easier to work with graph data and ensures that the data is organized and consistent.

Let’s start with inspect a toy network data as an example and convert it into a tbl_graph object.

You can find two data frames of this toy data in our class Google Drive “SNA toy data” folder:

“toynet.csv”: the edgelist data containing information about the edges in the network
“toyatt.csv”: the node data containing information about the nodes in the network

toy_nodes <- read.csv("toyatt.csv")
toy_edgelist <- read.csv("toynet.csv")

After read in the data sets, we convert them into the graph object using the tbl_graph() function.

ex.nw <- tbl_graph(nodes = toy_nodes,
                   edges = toy_edgelist, 
                   directed=T) # TRUE if it is a directed network

ex.nw

# A tbl_graph: 6 nodes and 12 edges
#
# A directed simple graph with 1 component
#
# Node Data: 6 × 4 (active)
   node attr1 attr2 attr3
  <int> <dbl> <dbl> <int>
1     1   2.4   2       1
2     2   2.6   2       1
3     3   1.1   1       1
4     4  -0.5  -0.5     0
5     5  -3    -2       0
6     6  -1     0.5     0
#
# Edge Data: 12 × 3
   from    to weight
  <int> <int>  <int>
1     1     2      1
2     1     3      1
3     2     1      1
# ℹ 9 more rows

class(ex.nw)

[1] "tbl_graph" "igraph"

1.2 Nodes

We can extract the node and edge data from this graph object.

The nodes data stores all the relevant information of nodes, which functions similar to the meta data associated with the documents in text-as-data analyses.

as.list(ex.nw)$nodes

# A tibble: 6 × 4
   node attr1 attr2 attr3
  <int> <dbl> <dbl> <int>
1     1   2.4   2       1
2     2   2.6   2       1
3     3   1.1   1       1
4     4  -0.5  -0.5     0
5     5  -3    -2       0
6     6  -1     0.5     0

1.3 Edgelist

This edge data is the edgelist, which contains two columns: “from” and “to”, representing the edges (i.e., ties) between nodes (i.e., actors).

as.list(ex.nw)$edges

# A tibble: 12 × 3
    from    to weight
   <int> <int>  <int>
 1     1     2      1
 2     1     3      1
 3     2     1      1
 4     2     3      1
 5     3     2      1
 6     3     4      1
 7     3     6      1
 8     4     5      1
 9     4     6      1
10     5     4      1
11     6     3      1
12     6     4      1

2 A case study: Harry Potter peer support networks

In the following instruction, we will walk through a case study of investigating the peer-support networks in the magic world of Harry Potter. This data is made possible by Goele Bossaert and Nadine Meidert (see here). The peer support ties mean voluntary emotional, instrumental, or informational support, or praise from one living, adolescent character to another within the book’s pages. In addition, characters’ attributes are included, including name, schoolyear, gender, and their house assigned by the sorting hat.

# install.packages("manynet")
library(manynet)

data(fict_potter) # In the older versions of `manynet`, HP data is called as ison_potter

Let’s see the basic information about this network data. This network data is stored in three classes (i.e., types of network objects) and can be directly used by functions from the manynet, tidygraph and igraph. Throughout this class, we will focus on the tidygraph way.

class(fict_potter)

[1] "mnet"      "tbl_graph" "igraph"

fict_potter

── # Harry Potter support network ──────────────────────────────────────────────

# A longitudinal, labelled, complex, directed network of 64 students and 544
support arcs over 6 waves

── Nodes

# A tibble: 64 × 5
  name              schoolyear gender house      active
  <chr>                  <int> <chr>  <chr>      <logi>
1 Adrian Pucey            1989 male   Slytherin  TRUE  
2 Alicia Spinnet          1989 female Gryffindor TRUE  
3 Angelina Johnson        1989 female Gryffindor TRUE  
4 Anthony Goldstein       1991 male   Ravenclaw  TRUE  
# ℹ 60 more rows

── Changes

# A tibble: 81 × 4
   time  node var    value
  <int> <int> <chr>  <lgl>
1     2     9 active TRUE 
2     2    21 active TRUE 
3     2    35 active TRUE 
4     2    39 active FALSE
# ℹ 77 more rows

── Ties

# A tibble: 544 × 3
   from    to  wave
  <int> <int> <dbl>
1    11    11     1
2    11    25     1
3    11    26     1
4    11    44     1
# ℹ 540 more rows

How many support relationships exist in each book (defined by the “wave” variable)?

# tie distribution across the books
fict_potter %>%
  activate(edges) %>%
  as_tibble() %>%  # we need to convert the edgelist to a data frame first before running other functions that requires a rectangular data structure. 
  group_by(wave) %>% 
  summarize(support.tie.count=n())

# A tibble: 6 × 2
   wave support.tie.count
  <dbl>             <int>
1     1                47
2     2               110
3     3               104
4     4                49
5     5               160
6     6                74

In the following demonstration, we will use the supporting network data from the sixth Harry Potter book (“Harry Potter and the Half-Blood Prince”) and name it to hp.6.

As shown, it is a directed network with 64 actors and 74 ties.

hp.6<- fict_potter %>%
  activate(edges) %>%
  filter(wave == 6) # `filter()` can be directly applied to the edgelist. Similar functions include `arrange()` and `mutate()`.

hp.6

── # Harry Potter support network ──────────────────────────────────────────────

# A longitudinal, labelled, complex, directed network of 64 students and 74
support arcs over 6 waves

── Nodes

# A tibble: 64 × 5
  name              schoolyear gender house      active
  <chr>                  <int> <chr>  <chr>      <logi>
1 Adrian Pucey            1989 male   Slytherin  TRUE  
2 Alicia Spinnet          1989 female Gryffindor TRUE  
3 Angelina Johnson        1989 female Gryffindor TRUE  
4 Anthony Goldstein       1991 male   Ravenclaw  TRUE  
# ℹ 60 more rows

── Changes

# A tibble: 81 × 4
   time  node var    value
  <int> <int> <chr>  <lgl>
1     2     9 active TRUE 
2     2    21 active TRUE 
3     2    35 active TRUE 
4     2    39 active FALSE
# ℹ 77 more rows

── Ties

# A tibble: 74 × 3
   from    to  wave
  <int> <int> <dbl>
1    11    11     6
2    11    25     6
3    11    56     6
4    11    58     6
# ℹ 70 more rows

2.1 Understanding the data

As like we do all analysis, we want to start with understanding our data. In SNA projects, we can manipulate the edges data, calculate the network structural characteristics (e.g., centrality measures of nodes), and learn the attributes of the actors.

Before doing these inspection, we need to use a pointer function activate() to tell R which data, either nodes or edges to work on.

2.2 Manipulating edges data

Here is another way, the tidy way, to extract the edgelist data.

hp.6_edgelist <- hp.6 %>%
  activate(edges) %>%
  as_tibble()

# Alternatively: as.list(hp.6)$edges

From the quick view of the edgelist, you might have noticed that self-nomination ties (self-loops) are included. A person can surely help her/himself. While in some cases, we don’t want self-nomination.

How to remove these self-nomination ties in R? This is equivalent to a data manipulation task we’ve learnt in the beginning of this class. Here we work with the filter() function again.

hp.6_no_self <- hp.6 %>%
  activate(edges) %>%
  filter(from != to) # exclude the edges where the "from" and "to" columns have the same value.

2.3 The importance of nodes

The centrality measures quantify the importance of influence of nodes. Let’s calculate the out-degree centrality of the characters and find the five most helpful characters.

top5_offer_help<-hp.6_no_self %>%
  activate(nodes) %>%
  mutate(out_degree = centrality_degree(mode="out")) %>% 
  top_n(5, out_degree) %>% # selects top 5 nodes, allowing for ties (i.e.,nodes with same number of out_degrees)
  select(name, out_degree) %>%  
  arrange(desc(out_degree)) %>% 
  as_tibble() 

top5_offer_help

# A tibble: 9 × 2
  name               out_degree
  <chr>                   <dbl>
1 Harry James Potter         10
2 Ronald Weasley              8
3 Hermione Granger            6
4 Ginny Weasley               5
5 Dean Thomas                 3
6 Fred Weasley                3
7 Luna Lovegood               3
8 Neville Longbottom          3
9 Seamus Finnigan             3

Unsurprisingly, the Trio are the most active helpers.

2.3.1 Inspecting nodes’ ties

Wait, who are these nodes? Let’s see the HP characters.

hp.6 %>%activate(nodes) %>%as_tibble() %>%  pull(name)

 [1] "Adrian Pucey"           "Alicia Spinnet"         "Angelina Johnson"      
 [4] "Anthony Goldstein"      "Blaise Zabini"          "C. Warrington"         
 [7] "Cedric Diggory"         "Cho Chang"              "Colin Creevey"         
[10] "Cormac McLaggen"        "Dean Thomas"            "Demelza Robins"        
[13] "Dennis Creevey"         "Draco Malfoy"           "Eddie Carmichael"      
[16] "Eleanor Branstone"      "Ernie Macmillan"        "Euan Abercrombie"      
[19] "Fred Weasley"           "George Weasley"         "Ginny Weasley"         
[22] "Graham Pritchard"       "Gregory Goyle"          "Hannah Abbott"         
[25] "Harry James Potter"     "Hermione Granger"       "Jimmy Peakes"          
[28] "Justin Finch-Fletchley" "Katie Bell"             "Kevin Whitby"          
[31] "Lavender Brown"         "Leanne"                 "Lee Jordan"            
[34] "Lucian Bole"            "Luna Lovegood"          "Malcolm Baddock"       
[37] "Mandy Brocklehurst"     "Marcus Belby"           "Marcus Flint"          
[40] "Michael Corner"         "Miles Bletchley"        "Millicent Bulstrode"   
[43] "Natalie McDonald"       "Neville Longbottom"     "Oliver Wood"           
[46] "Orla Quirke"            "Owen Cauldwell"         "Padma Patil"           
[49] "Pansy Parkinson"        "Parvati Patil"          "Penelope Clearwater"   
[52] "Percy Weasley"          "Peregrine Derrick"      "Roger Davies"          
[55] "Romilda Vane"           "Ronald Weasley"         "Rose Zeller"           
[58] "Seamus Finnigan"        "Stewart Ackerley"       "Susan Bones"           
[61] "Terry Boot"             "Theodore Nott"          "Vincent Crabbe"        
[64] "Zacharias Smith"

We can find the ties of a node using the node’s name. To do so, we use join family to link the nodes and edges data.

# Assign IDs to nodes 
hp.6_with_id <- hp.6 %>%
  activate(nodes) %>% 
  mutate(id = row_number()) # mutate function can be direcly applied to an edgelist in tidygraph

# Activate edges and join with nodes data to get the "name" variable
hp.6_edges_with_names.df <- hp.6_with_id %>%
  activate(edges) %>%
  as_tibble() %>% # again, we need to restructure the edgelist to a dataframe to run the following functions.
  left_join(hp.6_with_id %>% activate(nodes), 
            by = c("from" = "id"), copy = TRUE) %>% # join/attach the names of the support senders
  rename(from_name = name) %>%
  left_join(hp.6_with_id %>% activate(nodes), 
            by = c("to" = "id"), copy = TRUE) %>% # join/attach the names of the support receivers
  rename(to_name = name) %>% 
  select(from:from_name, to_name)

Now, we can check out the ties of the specific node. Let’s see the help offered by or received by Harry Potter.

hp.6_edges_with_names.df %>%
  filter(from_name == "Harry James Potter" | to_name == "Harry James Potter") %>%
  select(from_name, to_name)

# A tibble: 20 × 2
   from_name          to_name           
   <chr>              <chr>             
 1 Dean Thomas        Harry James Potter
 2 Fred Weasley       Harry James Potter
 3 George Weasley     Harry James Potter
 4 Ginny Weasley      Harry James Potter
 5 Harry James Potter Demelza Robins    
 6 Harry James Potter Fred Weasley      
 7 Harry James Potter George Weasley    
 8 Harry James Potter Ginny Weasley     
 9 Harry James Potter Harry James Potter
10 Harry James Potter Hermione Granger  
11 Harry James Potter Katie Bell        
12 Harry James Potter Leanne            
13 Harry James Potter Luna Lovegood     
14 Harry James Potter Neville Longbottom
15 Harry James Potter Ronald Weasley    
16 Hermione Granger   Harry James Potter
17 Luna Lovegood      Harry James Potter
18 Neville Longbottom Harry James Potter
19 Ronald Weasley     Harry James Potter
20 Seamus Finnigan    Harry James Potter

Activity 1

Can you find who received the most help? We need to calculate nodes’ **indegree*`** centrality, which indicates these characters’ popularity or the extent to which they receive support from others in the network.

top5_offer_received<-hp.6_no_self %>%
  activate(nodes) %>%
  mutate(in_degree = centrality_degree(mode="in")) %>% 
  top_n(5, in_degree) %>% 
  select(name, in_degree) %>% 
  arrange(desc(in_degree)) %>% 
  as_tibble() 
  
top5_offer_received

What else can you learn from the actors’ centrality measures? For example, an actor with high out-degree centrality but low in-degree centrality may be a key provider of help but may not receive much support in return. On the other hand, an actor with high in-degree centrality but low out-degree centrality may be a frequent recipient of help but may not actively offer assistance to others.

Activity 2

Can you calculate the betweeness centrality and find the character who are important in facilitating the flow of help or resources through the network?

top5_bridge<-hp.6_no_self %>%
  activate(nodes) %>%
  mutate(betweenness = centrality_betweenness(directed = TRUE))%>% 
  top_n(5, out_degree) %>% 
  select(name, out_degree)

2.4 The attributes of nodes

While tidygraph provides its own set of functions for data manipulation, such as mutate() and filter(), there are situations where we might want to use more data manipulation functions from the dplyr package, such as group_by(), summarise(), and the others. However, dplyr functions are designed to work with data frames or tibbles, not directly with graph objects.

To bridge this compatibility gap, we use the as_tibble() function to convert the nodes or edges of a graph object into a tibble format. By applying as_tibble() after activating the nodes or edges with activate(), we create a tabular data structure that is compatible with dplyr functions.

See below the below example if we want to know the house distribution of characters.

hp.6_no_self %>% 
  activate(nodes) %>% 
  as_tibble() %>% 
  group_by(house) %>% 
  summarise(n=n()) %>% 
  mutate(proportion=round(n/sum(n),2))

# A tibble: 4 × 3
  house          n proportion
  <chr>      <int>      <dbl>
1 Gryffindor    25       0.39
2 Hufflepuff    11       0.17
3 Ravenclaw     13       0.2 
4 Slytherin     15       0.23

Activity 3

What’s the gender distribution within each house?